Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add prepare_sector_split.R #30

Merged
merged 10 commits into from
Aug 1, 2024
Merged

add prepare_sector_split.R #30

merged 10 commits into from
Aug 1, 2024

Conversation

jacobvjk
Copy link
Member

@jacobvjk jacobvjk commented Jul 30, 2024

closes #9
closes #10

  • workflow.multi.loanbook gains module prepare_sector_split.R, which calculates equal weights and primary energy based sector weights for companies
  • deprecates the "worst_case" sector split, which was an option in the previous workflow.aggregate.loanbooks. This option has not been used and is unnecessarily complicated.
  • beyond that deprecation, the script maintains the functionality from workflow.aggregate.loanbooks
  • run_match_prioritize.R gains an option to split loan values based on outputs of prepare_sector_split.R. Option is set via config.yml
  • auxiliary data sets (primary_energy_efficiency and unit_conversion) needed to calculate the primary energy based sector split are integrated directly into prepare_sector_split.R. Previously, they were generated in data-raw/, but since we are currently not using a package structure and the data sets are not expected to be used elsewhere, they are integrated directly in the script for now.
  • prepare_sector_split.R can remove inactive companies based on the output of prepare_abcd.R, in case the option remove_inactive_companies == TRUE

@jacobvjk jacobvjk marked this pull request as ready for review July 31, 2024 11:36
@jacobvjk jacobvjk requested review from jdhoffa and cjyetman July 31, 2024 11:36
Copy link
Member

@jdhoffa jdhoffa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly a rubber stamp/ review of syntax, but LGTM

Comment on lines +3 to +4
abcd_id <- abcd %>%
dplyr::distinct(.data$company_id, .data$name_company)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: single line pipe (i don't really care that much, just making note)

Comment on lines +6 to +11
# identify lost_companies_sector_split and write to csv for inspection
lost_companies_sector_split <- companies_sector_split %>%
dplyr::anti_join(
abcd_id,
by = c("company_id")
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: single line pipe

Comment on lines +12 to +14
config_files <- config::get("file_names")
config_match_prio <- config::get("match_prioritize")
config_prepare_sector_split <- config::get("sector_split")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this add new/necessary keys to the config? Should that be documented somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, they are already in the example.config.yml

It should be documented, but at this point I prefer getting the full workflow set up before writing an extensive documentation about these. But will open a task for it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

We should consider how exactly we want to slice and dice the architecture here, and what/ where things should be documented (e.g. {config.yml + scripts} could also be {function args + functions} in a package).

The technical profile document will inform this, and ideally we can discuss it a bit more in the Tech Review.

@jacobvjk jacobvjk merged commit c3e825c into main Aug 1, 2024
@jacobvjk jacobvjk deleted the 9-prepare-sector-split branch August 1, 2024 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

apply sector split to match_prioritize add prepare_sector_split.R
2 participants